Last modified time: November 11, 2021

1. Overview of PM2.5 / Asthma in Bay Area

The above figure shows the PM2.5 concentration of each census tract in the bay area. It can be seen that the concentration of pm2.5 in those tracts which close to the inner bay (such as berkeley, Richmond, vallejo, san francisco, etc.) is higher than that in others. Also, in eastern part of the Bay Area, some tracts have a higher PM2.5 concentration.

PM2.5 indicator: Annual mean concentration of PM2.5 (weighted average of measured monitor concentrations and satellite observations, μg/m3), over three years (2015 to 2017).

This figure shows the distribution of asthma in the Bay Area. Some areas near the inner bay (such as vallejo, richmond, oakland, etc.), and the northeastern part of the bay area have higher asthma values.

Asthma indicator: Spatially modeled, age-adjusted rate of ED visits for asthma per 10,000 (averaged over 2015-2017)..

2. Correlation analysis of Asthma and PM2.5

2.1 Asthma ~ PM2.5

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = bay_pm25_Asthma)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

An increase of 1 unit in PM2.5 is associated with an increase of 19.862 in Asthma; 9.6% of the variation in Asthma is explained by the variation in PM2.5.

## [1] -0.0001923675

As is shown above, the density of residual does not show a normal distribution. Although the mean of residuals is close to zero, the distribution is obviously skewed to the left, which means residuals from this linear model appear significantly skewed, and we don’t have the conditions necessary to meaningfully interpret regression results on the data.

2.2 log(Asthma) ~ PM2.5

## 
## Call:
## lm(formula = Asthma_log ~ PM2.5, data = bay_pm25_Asthma)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

An increase of 1 in PM2.5 is associated with an increase of e^0.356 x in Asthma; 10% of the variation in Asthma_log is explained by the variation in PM2.5.

## [1] 6.974443e-06

As is shown above, the density of residual is almost a normal distribution. But near the mean (0), a small depression appears in the curve. This tells that the model can explain the linear relationship between log(asthma) and pm2.5 in most cases, but there are still some information that cannot be explained.

As is shown in table bay_residual and the above map, the census tract with the most negative “residual” is 6085513000, approximate location is Stanford. We know that residuals = Asthma_log - best_fit_candidate, the later one is the predict result of the log model. The most negative residual means the value of asthma_log is much smaller than what it is expected to be. This might because that Stanford’s population is younger (the probability of suffering from disease is low), or Stanford has implemented better controls on asthma.